61 research outputs found

    Depth Enhancement and Surface Reconstruction with RGB/D Sequence

    Get PDF
    Surface reconstruction and 3D modeling is a challenging task, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. It is fundamental to many applications such as robot navigation, animation and scene understanding, industrial control and medical diagnosis. In this dissertation, I take advantage of the consumer depth sensors for surface reconstruction. Considering its limited performance on capturing detailed surface geometry, a depth enhancement approach is proposed in the first place to recovery small and rich geometric details with captured depth and color sequence. In addition to enhancing its spatial resolution, I present a hybrid camera to improve the temporal resolution of consumer depth sensor and propose an optimization framework to capture high speed motion and generate high speed depth streams. Given the partial scans from the depth sensor, we also develop a novel fusion approach to build up complete and watertight human models with a template guided registration method. Finally, the problem of surface reconstruction for non-Lambertian objects, on which the current depth sensor fails, is addressed by exploiting multi-view images captured with a hand-held color camera and we propose a visual hull based approach to recovery the 3D model

    Multiple depth maps integration for 3D reconstruction using geodesic graph cuts

    Get PDF
    Depth images, in particular depth maps estimated from stereo vision, may have a substantial amount of outliers and result in inaccurate 3D modelling and reconstruction. To address this challenging issue, in this paper, a graph-cut based multiple depth maps integration approach is proposed to obtain smooth and watertight surfaces. First, confidence maps for the depth images are estimated to suppress noise, based on which reliable patches covering the object surface are determined. These patches are then exploited to estimate the path weight for 3D geodesic distance computation, where an adaptive regional term is introduced to deal with the “shorter-cuts” problem caused by the effect of the minimal surface bias. Finally, the adaptive regional term and the boundary term constructed using patches are combined in the graph-cut framework for more accurate and smoother 3D modelling. We demonstrate the superior performance of our algorithm on the well-known Middlebury multi-view database and additionally on real-world multiple depth images captured by Kinect. The experimental results have shown that our method is able to preserve the object protrusions and details while maintaining surface smoothness

    A Generative Human-Robot Motion Retargeting Approach Using a Single RGBD Sensor

    Get PDF
    The goal of human-robot motion retargeting is to let a robot follow the movements performed by a human subject. Typically in previous approaches, the human poses are precomputed from a human pose tracking system, after which the explicit joint mapping strategies are specified to apply the estimated poses to a target robot. However, there is not any generic mapping strategy that we can use to map the human joint to robots with different kinds of configurations. In this paper, we present a novel motion retargeting approach that combines the human pose estimation and the motion retargeting procedure in a unified generative framework without relying on any explicit mapping. First, a 3D parametric human-robot (HUMROB) model is proposed which has the specific joint and stability configurations as the target robot while its shape conforms the source human subject. The robot configurations, including its skeleton proportions, joint limitations, and DoFs are enforced in the HUMROB model and get preserved during the tracking procedure. Using a single RGBD camera to monitor human pose, we use the raw RGB and depth sequence as input. The HUMROB model is deformed to fit the input point cloud, from which the joint angle of the model is calculated and applied to the target robots for retargeting. In this way, instead of fitted individually for each joint, we will get the joint angle of the robot fitted globally so that the surface of the deformed model is as consistent as possible to the input point cloud. In the end, no explicit or pre-defined joint mapping strategies are needed. To demonstrate its effectiveness for human-robot motion retargeting, the approach is tested under both simulations and on real robots which have a quite different skeleton configurations and joint degree of freedoms (DoFs) as compared with the source human subjects

    Event-based Human Pose Tracking by Spiking Spatiotemporal Transformer

    Full text link
    Event camera, as an emerging biologically-inspired vision sensor for capturing motion dynamics, presents new potential for 3D human pose tracking, or video-based 3D human pose estimation. However, existing works in pose tracking either require the presence of additional gray-scale images to establish a solid starting pose, or ignore the temporal dependencies all together by collapsing segments of event streams to form static event frames. Meanwhile, although the effectiveness of Artificial Neural Networks (ANNs, a.k.a. dense deep learning) has been showcased in many event-based tasks, the use of ANNs tends to neglect the fact that compared to the dense frame-based image sequences, the occurrence of events from an event camera is spatiotemporally much sparser. Motivated by the above mentioned issues, we present in this paper a dedicated end-to-end sparse deep learning approach for event-based pose tracking: 1) to our knowledge this is the first time that 3D human pose tracking is obtained from events only, thus eliminating the need of accessing to any frame-based images as part of input; 2) our approach is based entirely upon the framework of Spiking Neural Networks (SNNs), which consists of Spike-Element-Wise (SEW) ResNet and a novel Spiking Spatiotemporal Transformer; 3) a large-scale synthetic dataset is constructed that features a broad and diverse set of annotated 3D human motions, as well as longer hours of event stream data, named SynEventHPD. Empirical experiments demonstrate that, with superior performance over the state-of-the-art (SOTA) ANNs counterparts, our approach also achieves a significant computation reduction of 80% in FLOPS. Furthermore, our proposed method also outperforms SOTA SNNs in the regression task of human pose tracking. Our implementation is available at https://github.com/JimmyZou/HumanPoseTracking_SNN and dataset will be released upon paper acceptance

    Dynamic Non-Rigid Objects Reconstruction with a Single RGB-D Sensor

    Get PDF
    This paper deals with the 3D reconstruction problem for dynamic non-rigid objects with a single RGB-D sensor. It is a challenging task as we consider the almost inevitable accumulation error issue in some previous sequential fusion methods and also the possible failure of surface tracking in a long sequence. Therefore, we propose a global non-rigid registration framework and tackle the drifting problem via an explicit loop closure. Our novel scheme starts with a fusion step to get multiple partial scans from the input sequence, followed by a pairwise non-rigid registration and loop detection step to obtain correspondences between neighboring partial pieces and those pieces that form a loop. Then, we perform a global registration procedure to align all those pieces together into a consistent canonical space as guided by those matches that we have established. Finally, our proposed model-update step helps fixing potential misalignments that still exist after the global registration. Both geometric and appearance constraints are enforced during our alignment; therefore, we are able to get the recovered model with accurate geometry as well as high fidelity color maps for the mesh. Experiments on both synthetic and various real datasets have demonstrated the capability of our approach to reconstruct complete and watertight deformable objects

    TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration

    Full text link
    We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. Unlike existing works that generate dance movements using a single modality such as music, our goal is to produce richer dance movements guided by the instructive information provided by the text. However, the lack of paired motion data with both music and text modalities limits the ability to generate dance movements that integrate both. To alleviate this challenge, we propose to utilize a 3D human motion VQ-VAE to project the motions of the two datasets into a latent space consisting of quantized vectors, which effectively mix the motion tokens from the two datasets with different distributions for training. Additionally, we propose a cross-modal transformer to integrate text instructions into motion generation architecture for generating 3D dance movements without degrading the performance of music-conditioned dance generation. To better evaluate the quality of the generated motion, we introduce two novel metrics, namely Motion Prediction Distance (MPD) and Freezing Score, to measure the coherence and freezing percentage of the generated motion. Extensive experiments show that our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities. Code will be available at: https://garfield-kh.github.io/TM2D/
    • …
    corecore